Efficient shared-memory support for parallel graph reduction

نویسندگان

  • Andrew J. Bennett
  • Paul H. J. Kelly
چکیده

This paper presents the results of a simulation study of cache coherency issues in parallel implementations of functional programming languages. Parallel graph reduction uses a heap shared between processors for all synchronisation and communication. We show that a high degree of spatial locality is often present and that the rate of synchronisation is much greater than for imperative programs. We propose a modiied coherency protocol with static cache line ownership and show that this allows locality to be exploited to at least the level of a conventional protocol, but without the unnecessary serialisation and network transactions this usually causes. The new protocol avoids false sharing, and makes it possible to reduce the number of messages exchanged, but relies on increasing the size of the cache lines exchanged to do so. It is therefore of most beneet with a high-bandwidth interconnection network with relatively high communication latencies or message handling overheads.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GRIP - A high-performance architecture for parallel graph reduction

GRIP is a high-performance parallel machine designed to execute functional programs using supercombinator graph reduction. It uses a high-bandwidth bus to provide access to a large, distributed shared memory, using intelligent memory units and packet-switching protocols to increase the number of processors which the bus can support. GRIP is also being programmed to support parallel Prolog and D...

متن کامل

Speeding up Parallel Graph Coloring

This paper presents new efficient parallel algorithms for finding approximate solutions to graph coloring problems. We consider an existing shared memory parallel graph coloring algorithm and suggest several enhancements both in terms of ordering the vertices so as to minimize cache misses, and performing vertex-to-processor assignments based on graph partitioning instead of random allocation. ...

متن کامل

Parallel Combinator Reduction: Some Performance Bounds

A parallel graph reduction machine simulator is described. This performs combinator reduction and can simulate various different parallel reduction strategies. A number of functional programs are examined, and experimental results presented comparing the amount of parallelism obtainable using explicit divide-and-conquer with the maximum amount of parallelism available in the programs. Ke ywords...

متن کامل

Scheduling vs Communication in PELCR

PELCR is an environment for λ-terms reduction on parallel/distributed computing systems. The computation performed in this environment is a distributed graph rewriting and a major optimization to achieve efficient execution consists of a message aggregation technique exhibiting the potential for strong reduction of the communication overhead. In this paper we discuss the interaction between the...

متن کامل

Locality and False Sharing in Coherent-Cache Parallel Graph Reduction

Parallel graph reduction is a model for parallel program execution in which shared-memory is used under a strict access regime with single assignment and blocking reads. We outline the design of an ee-cient and accurate multiprocessor simulation scheme and the results of a simulation study of the performance of a suite of benchmark programs operating under a cache coherency protocol that is rep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Future Generation Comp. Syst.

دوره 12  شماره 

صفحات  -

تاریخ انتشار 1997